Glioblastoma multiforme is the most presented and aggressive brain tumor in humans, involving glial cells, with an incidence of 2–3 cases per 100,000 person life-years in Europe and North America (Fonnet E. Bleeker, et al (2012)).
Treatment can involve chemotherapy, radiation and surgery. Median survival with standard-of-care radiation and chemotherapy with the alkylating agent temozolomide is 15 months(Johnson, Derek R. et al (2011)) while the median survival without treatment is 4 and a half months. Regretfully glioblastomas are notorious for resistance to therapy, which has been attributed to DNA-repair proficiency, a multitude of deregulated molecular pathways, and, more recently, to the particular biologic behavior of tumor stem-like cells. Here, based on the reference work of Murat, A. et al (2008), we aimed to identify the molecular profiles specific for treatment resistance to the current standard of care of concomitant chemoradiotherapy with temozolomide.
To achieve our goal, we take from the reference study a set of gene expression profiles of 80 glioblastomas of patients treated whithin clinical trials of concomitant and adjuvant temozolomide to radiotherapy (n=52) and patients treated with only radioterapy (n=28). In addition, 4 control patients were added to the study.
This document should be processed from R and you need to install the packages knitr and markdown. Once they are installed, you have to type the following instructions that generate a HTML document that you can open with a web browser:
library(knitr) ## required for "knitting" from Rmd to md
library(markdown) ## required for processing from md to HTML
knit("projectTemplate.Rmd", "projectTemplate.md") ## process Rmd to md
markdownToHTML("projectTemplate.md", "projectTemplate.html") ## process md to HTML
browseURL("projectTemplate.html") ## open the resulting HTML file from R
The microarray data used during the research of Murat, A. et al (2008) can be found at the Gene Expression Omnibus, with the accession number GSE7696. In order to proceed with this analysis, the file GSE7696_RAW.tar must be uncompressed under the directory ‘data/raw/’ and the file GSE7696_series_matrix.txt.gz must be placed inside ‘data/’.
The original data contained 84 samples, of which 4 are controls, 28 correspond to patients treated with radiotherapy, and 52 to patients treated with both TMZ and radiotherapy. In order to have a more manageable analysis we decided to proceed, by now, with only 21 samples. The following script was used to do a subsampling keeping the original proportions among controls, TMZ/radiotherapy and radiotherapy samples.
We need GEOquery to retrieve the assay data and affy to handle data from Affymetrix 3’-biased Arrays.
library(affy)
library(GEOquery)
This is a simple way to cache the full eset (all the assay data) in a simple object, to avoid loading it from the .txt.gz the next time (a costly process)
# Cache the data to avoid having to load it again next time from the .txt
if (file.exists('full_eset.rds')) {
eset = readRDS('full_eset.rds')
} else {
eset = getGEO(filename='data/GSE7696_series_matrix.txt.gz')
saveRDS(eset, 'full_eset.rds')
}
We separate the sample names by treatment group, and take enough samples from each group to have a total of 21, keeping the original proportions.
# extract sample names
samples = colnames(eset)
# separate the samples by treatment group
no_treated = samples[eset$characteristics_ch1.5 == "treatment: NA"]
tmz_rt = samples[eset$characteristics_ch1.5 == "treatment: TMZ/radiotherapy"]
rt = samples[eset$characteristics_ch1.5 == "treatment: radiotherapy"]
# take 21 samples, keeping the proportions of the original dataset
set.seed(123456)
no_treated = sample(no_treated, size=1)
tmz_rt = sample(tmz_rt, size=13)
rt = sample(rt, size=7)
Join the three arrays of sample names, and turn them into file names. Then read only the necessary CEL files. This allows us to retrieve the data much faster, as we are loading only the 21 required samples, and not the whole set of 84 samples.
subsample_names = c(no_treated, tmz_rt, rt)
# turn sample names into filenames
subsample_filenames = paste(subsample_names, '.cel.gz', sep='')
# read only the necessary the expression data
glioData = ReadAffy(celfile.path = "./data/raw", filenames=subsample_filenames)
sampleNames(glioData) = letters[1:21]
# display used samples:
subsample_names
## [1] "GSM187180" "GSM187213" "GSM187178" "GSM187175" "GSM187232"
## [6] "GSM187163" "GSM187188" "GSM187158" "GSM187222" "GSM187161"
## [11] "GSM187206" "GSM187184" "GSM187211" "GSM187208" "GSM187234"
## [16] "GSM187229" "GSM187225" "GSM187186" "GSM187192" "GSM187214"
## [21] "GSM187185"
We also filter the original assay data to keep only the data corresponding to the samples we have choosen. We finally save both objects for posterior use.
subsampled_eset = eset[, subsample_names]
# save the subsampled data for posterior use
if (!file.exists('eset.rds')) {
saveRDS(subsampled_eset, 'eset.rds')
}
if (!file.exists('glioData.rds')) {
saveRDS(glioData, 'glioData.rds')
}
eset = subsampled_eset
library(affyPLM)
## Loading required package: gcrma
## Loading required package: preprocessCore
We plot, in the first place, the chip images in order to detect artifacts and discard those samples corresponding to malfunctioning chips.
# Scanner plot analysis
par(mfrow = c(3, 7), mar = c(1, 1, 3, 1))
image(glioData)